我们微调GPT-3使用基于文本的Web浏览环境来回答长形问题,允许模型搜索和导航Web。通过建立任务,以便通过人类执行,我们能够使用模仿学习培训在任务上的模型,然后通过人体反馈优化答案质量。为了使人为评估事实精度更容易,模型必须在浏览支持答案时收集引用。我们在ELI5上培训并评估我们的模型,Reddit用户提出的问题数据集。我们的最佳模型是通过使用行为克隆进行微调GPT-3获得的,然后对训练训练的奖励模型进行拒绝采样来获得以预测人类偏好。这种模式的答案是人类56%的答案,我们的人类示威者的时间和69%的时间到Reddit的最高投票答复。
translated by 谷歌翻译
人工智能(AI)的应用范围是巨大的,危害可能性也是如此。越来越愤怒地对来自AI系统的潜在风险产生了刺激行动,以解决这些风险,同时侵蚀对AI系统的信心以及发展它们的组织。 2019年研究发现了80多个出版和采用了“AI伦理原则”的组织,从此加入了更多。但原则往往会在“什么”和“如何”之间的差距和“如何”的差距。这样的差距已经启用可疑或道德可疑的行为,这促进了特定组织的可信度,更广泛地。因此,迫切需要允许AI开发人员防止伤害的具体方法,并允许他们通过可验证行为来证明其可靠性。下面,我们探索机制(从ARXIV:2004.07213绘制)创建一个生态系统,即AI开发人员可以获得信任 - 如果他们值得信赖。更好地评估开发商可信度,可以为用户选择,员工行动,投资决策,法律追索和新兴治理提供信息。制度。
translated by 谷歌翻译
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
translated by 谷歌翻译
Overfitting is a problem in Convolutional Neural Networks (CNN) that causes poor generalization of models on unseen data. To remediate this problem, many new and diverse data augmentation methods (DA) have been proposed to supplement or generate more training data, and thereby increase its quality. In this work, we propose a new data augmentation algorithm: VoronoiPatches (VP). We primarily utilize non-linear recombination of information within an image, fragmenting and occluding small information patches. Unlike other DA methods, VP uses small convex polygon-shaped patches in a random layout to transport information around within an image. Sudden transitions created between patches and the original image can, optionally, be smoothed. In our experiments, VP outperformed current DA methods regarding model variance and overfitting tendencies. We demonstrate data augmentation utilizing non-linear re-combination of information within images, and non-orthogonal shapes and structures improves CNN model robustness on unseen data.
translated by 谷歌翻译
One of today's goals for industrial robot systems is to allow fast and easy provisioning for new tasks. Skill-based systems that use planning and knowledge representation have long been one possible answer to this. However, especially with contact-rich robot tasks that need careful parameter settings, such reasoning techniques can fall short if the required knowledge not adequately modeled. We show an approach that provides a combination of task-level planning and reasoning with targeted learning of skill parameters for a task at hand. Starting from a task goal formulated in PDDL, the learnable parameters in the plan are identified and an operator can choose reward functions and parameters for the learning process. A tight integration with a knowledge framework allows to form a prior for learning and the usage of multi-objective Bayesian optimization eases to balance aspects such as safety and task performance that can often affect each other. We demonstrate the efficacy and versatility of our approach by learning skill parameters for two different contact-rich tasks and show their successful execution on a real 7-DOF KUKA-iiwa.
translated by 谷歌翻译
We present a smoothly broken power law functional form that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, or upstream performance varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, diffusion generative modeling of images, arithmetic, and reinforcement learning. When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
translated by 谷歌翻译
我们提供了奖励黑客的第一个正式定义,即优化不完美的代理奖励功能的现象,$ \ Mathcal {\ tilde {r}} $,根据真实的奖励功能,$ \ MATHCAL {R} $导致性能差。 。我们说,如果增加预期的代理回报率永远无法减少预期的真实回报,则代理是不可接受的。直觉上,可以通过从奖励功能(使其“较窄”)中留出一些术语或忽略大致等效的结果之间的细粒度区分来创建一个不可接受的代理,但是我们表明情况通常不是这样。一个关键的见解是,奖励的线性性(在州行动访问计数中)使得无法实现的状况非常强烈。特别是,对于所有随机策略的集合,只有在其中一个是恒定的,只有两个奖励函数才能是不可接受的。因此,我们将注意力转移到确定性的政策和有限的随机政策集中,在这些策略中,始终存在非平凡的不可动摇的对,并为简化的存在建立必要和充分的条件,这是一个重要的不被限制的特殊情况。我们的结果揭示了使用奖励函数指定狭窄任务和对齐人类价值的AI系统之间的紧张关系。
translated by 谷歌翻译
现代机器学习研究依赖于相对较少的精心策划数据集。即使在这些数据集中,通常在“不整合”或原始数据中,从业人员也面临着重要的数据质量和多样性问题,这些问题可能会非常强烈地解决。应对这些挑战的现有方法往往会对特定问题做出强烈的假设,并且通常需要先验知识或元数据,例如域标签。我们的工作与这些方法是正交的:相反,我们专注于为元数据考古学提供一个统一和有效的框架 - 在数据集中发现和推断示例的元数据。我们使用简单的转换策划了可能存在的数据集(例如,错误标记,非典型或过度分布示例)中可能存在的数据子集,并利用这些探针套件之间的学习动力学差异来推断感兴趣的元数据。我们的方法与跨不同任务的更复杂的缓解方法相提并论:识别和纠正标签错误的示例,对少数民族样本进行分类,优先考虑与培训相关的点并启用相关示例的可扩展人类审核。
translated by 谷歌翻译
机器人技能系统旨在减少机器人设置时间的新制造任务。但是,对于灵巧,接触术的任务,通常很难找到正确的技能参数。一种策略是通过允许机器人系统直接学习任务来学习这些参数。对于学习问题,机器人操作员通常可以指定参数值的类型和范围。然而,鉴于他们先前的经验,机器人操作员应该能够通过提供有关在参数空间中找到最佳解决方案的知识猜测,从而进一步帮助学习过程。有趣的是,当前的机器人学习框架中没有利用这种先验知识。我们介绍了一种结合用户先验和贝叶斯优化的方法,以便在机器人部署时间快速优化机器人工业任务。我们在模拟中学习的三个任务以及直接在真实机器人系统上学习的两个任务中学习了我们的方法。此外,我们通过自动从良好表现的配置中自动构造先验来从相应的仿真任务中转移知识,以在真实系统上学习。为了处理潜在的任务目标,任务被建模为多目标问题。我们的结果表明,操作员的先验是用户指定和转移的,大大加快了富丽堂皇的阵线的发现,并且通常产生的最终性能远远超过了拟议的基线。
translated by 谷歌翻译
在医学成像中不同分布班次下概括的学习模型一直是一项长期的研究挑战。在视觉研究从业者之间有高效和强大的视觉表现学习有几个提案,特别是在敏感和临界生物医学领域。在本文中,我们提出了一种胸部X射线病理的分发通用的想法,这些胸部X射线病理学的概念使用简单的平衡批量采样技术。我们观察到,多次训练数据集之间的平衡采样可提高对培训的基线模型而不进行平衡的性能。
translated by 谷歌翻译